Optimize reflection of F# types, part 2 #9784

kerams · 2020-07-25T09:45:13Z

Continuation of #9714.

PreComputeRecordConstructor
PreComputeRecordFieldReader
PreComputeRecordReader - reworked from last time
PreComputeUnionConstructor
PreComputeUnionReader
PreComputeUnionTagReader

PreComputeTupleReader and PreComputeTupleConstructor are a little more complicated, so perhaps another time.

The rest of the PreCompute family only return corresponding System.Reflection.MethodBase for further reflection use.

Type	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
PreComputeRecordConstructor	Reflection	554.210 ns	3.3056 ns	3.0921 ns	0.0134	-	-	112 B
PreComputeRecordConstructor	Compiled	9.999 ns	0.0416 ns	0.0389 ns	0.0057	-	-	48 B

PreComputeRecordFieldReader	Reflection	112.941 ns	1.9830 ns	1.8549 ns	0.0029	-	-	24 B
PreComputeRecordFieldReader	Compiled	3.431 ns	0.1047 ns	0.1120 ns	0.0029	-	-	24 B

PreComputeRecordReader	Reflection	562.094 ns	10.2099 ns	9.5503 ns	0.0134	-	-	112 B
PreComputeRecordReader	Compiled	22.212 ns	0.2661 ns	0.2489 ns	0.0134	-	-	112 B

PreComputeUnionConstructor	Reflection	473.935 ns	5.1988 ns	4.8629 ns	0.0105	-	-	88 B
PreComputeUnionConstructor	Compiled	7.737 ns	0.0595 ns	0.0497 ns	0.0048	-	-	40 B

PreComputeUnionReader	Reflection	318.424 ns	4.2661 ns	3.9905 ns	0.0086	-	-	72 B
PreComputeUnionReader	Compiled	13.706 ns	0.3290 ns	0.4822 ns	0.0086	-	-	72 B

PreComputeUnionTagReader	Reflection	373.670 ns	2.4971 ns	2.3358 ns	0.0029	-	-	24 B
PreComputeUnionTagReader	Compiled	6.567 ns	0.1600 ns	0.2136 ns	-	-	-	-

…teRecordFieldReader

…uteUnionReader

kerams · 2020-07-25T09:49:20Z

src/fsharp/FSharp.Core/reflect.fs

-                    (fun (obj: obj) ->
-                        let m2b = typ.GetMethod("GetTag", BindingFlags.Static ||| bindingFlags, null, [| typ |], null)
-                        m2b.Invoke(null, [|obj|]) :?> int)
+                    let m2b = typ.GetMethod("GetTag", BindingFlags.Static ||| bindingFlags, null, [| typ |], null)
+                    (fun (obj: obj) -> m2b.Invoke(null, [|obj|]) :?> int)


There's probably no need to look up the method on every invocation. I didn't know how to test this specifically though. When is it ever the case that a DU doesn't have a Tag property?

A DU always has a tag as far as I know - struct/ref type, single case/multi case all have tag properties.

Hmm, then I'm not sure if this branch executes.

I think Paul had PR to do tagless DUs a few (lot of) years ago.

KevinRansom

Looks good.

Thank you for this

cartermp

I was wondering if there was a way to further reduce allocations/CPU time by making the loop not call into an enumerator, but that's clearly a micro-optimization at this point.

kerams · 2020-07-26T05:41:15Z

@cartermp, what loop? The only loops I see are those that are used to create expression trees during the precomputation. There are no loops inside the compiled delegates.

cartermp · 2020-07-26T16:12:19Z

In this case I'm referring to a loop like this: https://github.com/dotnet/fsharp/pull/9784/files#diff-54378f53a8612b2adf50a6efb115ba5aR134

The compiler rewrites this to use an enumerator no matter which kind of for loop you use, see:

current impl
for x = 0 to loop

Rewriting it to a while loop and manually bumping the indexer will remove that like so but it's unlikely to matter much from a perf standpoint. Just kind of weird that trivially changing the for loop had no effect when it usually does.

kerams · 2020-07-26T16:18:08Z

Yeah, I'd be all for altering the loop were this happening inside of the delegate. Don't think it matters at all during prep stage.

abelbraaksma · 2020-07-26T17:26:28Z

In hot paths it matters for ints, it's about 4x slower when it becomes an enumeration. That may not matter much here, though. There's an issue open to better optimize this, the example from @cartermp will help there.

KevinRansom · 2020-07-27T20:39:42Z

This func is compileUnionCaseConstructorFunc ... it should not happen frequently.

Daniel-Svensson · 2020-07-28T13:10:21Z

@kerams if you want to further improve performance ~~(by 6-7 (on netcore 3.1) times~~ for the "get all properties" case you can modify the expression to initialize the array in reverse order to avoid array bounds check.

I have a method added to your benchmark from the previous PR and results laying around at home that I can post next week if it is of interest.
But it might be regarded as a micro optimization and not too relevant here since one might expect Expression.InitArray to be as fast as possible

and it might be a flaw in the benchmark code

kerams · 2020-07-28T13:50:36Z

@Daniel-Svensson, are you saying

var a = int[5];
a[0] = 0;
a[1] = 0;
a[2] = 0;
a[3] = 0;
a[4] = 0;

is 6 times slower than

var a = int[5];
a[4] = 0;
a[3] = 0;
a[2] = 0;
a[1] = 0;
a[0] = 0;

??

Apologies for using 'the other sharp'.

Daniel-Svensson · 2020-07-28T16:41:44Z

@kerams actually I belive I must have a bug in my fsharp (cannot check it this week), it was a quick hack late before bedtime.
If the length is unknown to the jit the reverse order is much faster but thinking more about it i would not expect such large improvements but only double digit percentage. There are many such "hacks" in the runtime

* Compile PreComputeRecordConstructor, PreComputeRecordReader, PreComputeRecordFieldReader * Compile PreComputeUnionConstructor, PreComputeUnionTagReader, PreComputeUnionReader

kerams added 2 commits July 24, 2020 21:58

Compile PreComputeRecordConstructor, PreComputeRecordReader, PreCompu…

e89b10d

…teRecordFieldReader

Compile PreComputeUnionConstructor, PreComputeUnionTagReader, PreComp…

9ee14f1

…uteUnionReader

kerams commented Jul 25, 2020

View reviewed changes

KevinRansom approved these changes Jul 25, 2020

View reviewed changes

cartermp approved these changes Jul 25, 2020

View reviewed changes

KevinRansom merged commit a4e0947 into dotnet:master Jul 25, 2020

kerams deleted the reflection-2 branch July 26, 2020 05:38

kerams mentioned this pull request Mar 13, 2021

Optimize PreComputeTupleConstructor #11230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize reflection of F# types, part 2 #9784

Optimize reflection of F# types, part 2 #9784

kerams commented Jul 25, 2020 •

edited

Loading

kerams Jul 25, 2020

cartermp Jul 25, 2020

kerams Jul 25, 2020

KevinRansom Jul 25, 2020

KevinRansom left a comment

cartermp left a comment

kerams commented Jul 26, 2020

cartermp commented Jul 26, 2020

kerams commented Jul 26, 2020

abelbraaksma commented Jul 26, 2020 •

edited

Loading

KevinRansom commented Jul 27, 2020

Daniel-Svensson commented Jul 28, 2020 •

edited

Loading

kerams commented Jul 28, 2020 •

edited

Loading

Daniel-Svensson commented Jul 28, 2020

Optimize reflection of F# types, part 2 #9784

Optimize reflection of F# types, part 2 #9784

Conversation

kerams commented Jul 25, 2020 • edited Loading

kerams Jul 25, 2020

Choose a reason for hiding this comment

cartermp Jul 25, 2020

Choose a reason for hiding this comment

kerams Jul 25, 2020

Choose a reason for hiding this comment

KevinRansom Jul 25, 2020

Choose a reason for hiding this comment

KevinRansom left a comment

Choose a reason for hiding this comment

cartermp left a comment

Choose a reason for hiding this comment

kerams commented Jul 26, 2020

cartermp commented Jul 26, 2020

kerams commented Jul 26, 2020

abelbraaksma commented Jul 26, 2020 • edited Loading

KevinRansom commented Jul 27, 2020

Daniel-Svensson commented Jul 28, 2020 • edited Loading

kerams commented Jul 28, 2020 • edited Loading

Daniel-Svensson commented Jul 28, 2020

kerams commented Jul 25, 2020 •

edited

Loading

abelbraaksma commented Jul 26, 2020 •

edited

Loading

Daniel-Svensson commented Jul 28, 2020 •

edited

Loading

kerams commented Jul 28, 2020 •

edited

Loading